Statistical distributions of sequencing by synthesis with probabilistic nucleotide incorporation

نویسنده

  • Yong Kong
چکیده

Sequencing by synthesis is used in many next-generation DNA sequencing technologies. Some of the technologies, especially those exploring the principle of single-molecule sequencing, allow incomplete nucleotide incorporation in each cycle. We derive statistical distributions for sequencing by synthesis by taking into account the possibility that nucleotide incorporation may not be complete in each flow cycle. The statistical distributions are expressed in terms of nucleotide probabilities of the target sequences and the nucleotide incorporation probabilities for each nucleotide. We give exact distributions both for fixed number of flow cycles and for fixed sequence length. Explicit formulas are derived for the mean and variance of these distributions. The results are generalizations of our previous work for pyrosequencing. Incomplete nucleotide incorporation leads to significant change in the mean and variance of the distributions, but still they can be approximated by normal distributions with the same mean and variance. The results are also generalized to handle sequence context dependent incorporation. The statistical distributions will be useful for instrument and software development for sequencing by synthesis platforms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Length distribution of sequencing by synthesis: fixed flow cycle model

Sequencing by synthesis is the underlying technology for many next- generation DNA sequencing platforms. We developed a new model, the fixed flow cycle model, to derive the distributions of sequence length for a given number of flow cycles under the general conditions where the nucleotide incorporation is probabilistic and may be incomplete, as in some single-molecule sequencing technologies. U...

متن کامل

Single Nucleotide Polymorphisms (SNPs) of GDF9 Gene in Bahmaei and Lak Ghashghaei Sheep Breeds and Its Association with Litter Size

Growth differentiation factor 9 (GDF9) belong to the superfamily of transforming growth factor β that is highly expressed in growing ovarian follicles of oocyte, and it has been strongly related to fecundity traits in sheep. Therefore, the GDF9 gene could serve as a genetic marker for improvement of reproductive performance in sheep. Therefore, the aim of this study was to invest...

متن کامل

A Statistical-Probabilistic Pattern for Determination of Tunnel Advance Step by Quantitative Risk Analysis

One of the main challenges faced in design and construction phases of tunneling projects is the determination of maximum allowable advance step to maximize excavation rate and reduce project delivery time. Considering the complexity of determining this factor and unexpected risks associated with inappropriate determination of that, it is necessary to employ a method which is capable of accounti...

متن کامل

Statistical distributions of pyrosequencing

Pyrosequencing is emerging as one of the important next-generation sequencing technologies. We derive the statistical distributions of this technique in terms of nucleotide probabilities of the target sequences. We give exact distributions both for fixed number of flow cycles and for fixed sequence length. Explicit formulas are derived for the mean and variance of these distributions. In both c...

متن کامل

Source Separation By Score Synthesis

A musical score provides a great deal of information about a piece of music. In this paper we consider the incorporation of a music score to guide source separation on a single channel recording. We propose a method based on synthesizing lines of music in the score. Dynamic time warping (DTW) is used to to fit the synthesized data to the recording. These are then used as prior distributions in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of computational biology : a journal of computational molecular cell biology

دوره 16 6  شماره 

صفحات  -

تاریخ انتشار 2009